Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data
نویسندگان
چکیده
The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.
منابع مشابه
RDFProv: A relational RDF store for querying and managing scientific workflow provenance
Article history: Received 12 October 2008 Received in revised form 8 March 2010 Accepted 11 March 2010 Available online 23 March 2010 Provenance metadata has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. The provenance management problem concerns the efficiency and effectiveness of...
متن کاملScientific Workflow Provenance Metadata Management Using an RDBMS-based RDF Store
Provenance management has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. This paper proposes an approach to provenance management that seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technologies with the storage and querying power...
متن کاملReflections on Provenance Ontology Encodings
As more data (especially scientific data) is digitized and put on the Web, the importance of tracking and sharing its provenance metadata grows. Besides capturing the annotation properties of data, provenance research also emphasizes interlinking relevant data. Therefore, it is desirable to make provenance metadata easy to access, share, reuse, integrate and reason with. To address these requir...
متن کاملProvenance and Annotations for Linked Data
Provenance tracking for Linked Data requires the identification of Linked Data resources. Annotating Linked Data on the level of single statements requires the identification of these statements. The concept of a Provenance Context is introduced as the basis for a consistent data model for Linked Data that incorporates current best-practices and creates identity for every published Linked Datas...
متن کاملScientific Workflow Provenance Metadata Management Using an RDBMS
Provenance management has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. This paper proposes an approach to provenance management that seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technologies with the storage and querying power...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management
دوره 6187 شماره
صفحات -
تاریخ انتشار 2010